Credit risk modeling is an essential component of the lending process for banks and other financial institutions, as it helps to determine the creditworthiness of borrowers, the risk of default, and the appropriate level of interest rates to charge. Credit risk modeling is the process of assessing the likelihood of a borrower defaulting on a loan or failing to repay debt.
In recent years, there has been a growing interest in using machine learning and artificial intelligence (AI) techniques to improve credit risk modeling. These techniques can analyze large datasets and identify patterns that may not be evident in traditional statistical models, leading to more accurate risk assessments and better lending decisions.
Overall, credit risk modeling plays a crucial role in the financial industry, helping lenders to assess the risk of default and make informed decisions about lending to specific borrowers. As the financial industry continues to evolve and new technologies emerge, credit risk modeling will continue to be an essential component of the lending process.
Quick review
Expected Loss (EL) is the amount of money a lender can expect to lose on average over the life of a loan due to default. It takes into account the probability of default, the exposure at default, and the loss given default. Here's how to calculate the Expected Loss:
EL = PD x LGD x EAD
Where:
Machine Learning model (classification problem) with a PD of 10%LDG = (Total exposure - Recoveries) / Total exposure = (USD 100,000 - USD 20,000) / USD 100,000 = 80\%EAD = Total exposure x (1 - Recovery rate) = USD 100,000 x (1 - 0.20) = USD 80,000To calculate the Expected Loss, you need to estimate each of these components based on historical data.
For example, suppose a lender has a USD 100,000 loan to a borrower with a probability of default of 10\%, a loss given default of 80\%, and an exposure at default of USD 80,000. The Expected Loss with formulas and number above, would be:
EL (result) = 10% x 80% x $80,000 = $6,400
This means that the lender can expect to lose $6,400 on average over the life of the loan due to default. The Expected Loss is an important metric for lenders because it helps them estimate the amount of risk they are taking on and set appropriate loan pricing and risk management strategies.
import os
import h2o
from pyspark.sql import SparkSession
import pandas as pd
# from deltalake import DeltaTable
## Metrics evaluation
from pyspark.ml.evaluation import RegressionEvaluator
## Sklearn Metrics
from sklearn.metrics import (confusion_matrix, classification_report, accuracy_score,
roc_auc_score, recall_score, roc_auc_score)
import warnings
warnings.filterwarnings('ignore')
def fshape(dataframe1):
print('Shape : ', dataframe1.count(), len(dataframe1.columns))
def fhead(dataframe1, num_records=3):
pd.options.display.max_columns = None
return dataframe1.limit(num_records).toPandas()
def fsummary(dataframe1):
return dataframe1.summary().toPandas()
## default Spark appName - se preferir
spark = SparkSession.builder.appName('Spark3-ML-quick-app').master('local[*]').getOrCreate()
sc = spark.sparkContext
spark
SparkSession - in-memory
h2o.connect(ip='172.25.238.198')
h2o.remove_all()
Connecting to H2O server at http://172.25.238.198:54321 ... successful. Warning: Your H2O cluster version is too old (3 months and 24 days)!Please download and install the latest version from http://h2o.ai/download/
| H2O_cluster_uptime: | 2 hours 42 mins |
| H2O_cluster_timezone: | America/Sao_Paulo |
| H2O_data_parsing_timezone: | UTC |
| H2O_cluster_version: | 3.38.0.4 |
| H2O_cluster_version_age: | 3 months and 24 days !!! |
| H2O_cluster_name: | userds1 |
| H2O_cluster_total_nodes: | 1 |
| H2O_cluster_free_memory: | 5.009 Gb |
| H2O_cluster_total_cores: | 12 |
| H2O_cluster_allowed_cores: | 12 |
| H2O_cluster_status: | locked, healthy |
| H2O_connection_url: | http://172.25.238.198:54321 |
| H2O_connection_proxy: | null |
| H2O_internal_security: | False |
| Python_version: | 3.9.13 final |
data_dir = '/tmp/Credit_Risk_Modeling/dat1_raw/'
sdf1_loan = spark.read.parquet(data_dir + 'dat1_loan.2M__1.2GB.FULL_FILE_WITH_CONTRACT.parquet/')
fshape(sdf1_loan)
# sdf1_loan.printSchema()
fhead(sdf1_loan)
Shape : 2260668 146
| contract_id | acc_now_delinq | acc_open_past_24mths | addr_state | all_util | annual_inc | annual_inc_joint | application_type | avg_cur_bal | bc_open_to_buy | bc_util | chargeoff_within_12_mths | collection_recovery_fee | collections_12_mths_ex_med | debt_settlement_flag | debt_settlement_flag_date | deferral_term | delinq_2yrs | delinq_amnt | desc | disbursement_method | dti | dti_joint | earliest_cr_line | emp_length | emp_title | funded_amnt | funded_amnt_inv | grade | hardship_amount | hardship_dpd | hardship_end_date | hardship_flag | hardship_last_payment_amount | hardship_length | hardship_loan_status | hardship_payoff_balance_amount | hardship_reason | hardship_start_date | hardship_status | hardship_type | home_ownership | id | il_util | initial_list_status | inq_fi | inq_last_12m | inq_last_6mths | installment | int_rate | issue_d | last_credit_pull_d | last_pymnt_amnt | last_pymnt_d | loan_amnt | loan_status | max_bal_bc | member_id | mo_sin_old_il_acct | mo_sin_old_rev_tl_op | mo_sin_rcnt_rev_tl_op | mo_sin_rcnt_tl | mort_acc | mths_since_last_delinq | mths_since_last_major_derog | mths_since_last_record | mths_since_rcnt_il | mths_since_recent_bc | mths_since_recent_bc_dlq | mths_since_recent_inq | mths_since_recent_revol_delinq | next_pymnt_d | num_accts_ever_120_pd | num_actv_bc_tl | num_actv_rev_tl | num_bc_sats | num_bc_tl | num_il_tl | num_op_rev_tl | num_rev_accts | num_rev_tl_bal_gt_0 | num_sats | num_tl_120dpd_2m | num_tl_30dpd | num_tl_90g_dpd_24m | num_tl_op_past_12m | open_acc | open_acc_6m | open_act_il | open_il_12m | open_il_24m | open_rv_12m | open_rv_24m | orig_projected_additional_accrued_interest | out_prncp | out_prncp_inv | payment_plan_start_date | pct_tl_nvr_dlq | percent_bc_gt_75 | policy_code | pub_rec | pub_rec_bankruptcies | purpose | pymnt_plan | recoveries | revol_bal | revol_bal_joint | revol_util | sec_app_chargeoff_within_12_mths | sec_app_collections_12_mths_ex_med | sec_app_earliest_cr_line | sec_app_inq_last_6mths | sec_app_mort_acc | sec_app_mths_since_last_major_derog | sec_app_num_rev_accts | sec_app_open_acc | sec_app_open_act_il | sec_app_revol_util | settlement_amount | settlement_date | settlement_percentage | settlement_status | settlement_term | sub_grade | tax_liens | term | title | tot_coll_amt | tot_cur_bal | tot_hi_cred_lim | total_acc | total_bal_ex_mort | total_bal_il | total_bc_limit | total_cu_tl | total_il_high_credit_limit | total_pymnt | total_pymnt_inv | total_rec_int | total_rec_late_fee | total_rec_prncp | total_rev_hi_lim | url | verification_status | verification_status_joint | zip_code | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 42949672960 | 0 | 9 | NY | 28 | 55000.0 | NaN | Individual | 1878 | 34360 | 5.9 | 0 | 0.0 | 0 | N | None | NaN | 0 | 0 | None | Cash | 18.24 | NaN | Apr-2001 | 10+ years | Chef | 2500 | 2500.0 | C | NaN | NaN | None | N | NaN | NaN | None | NaN | None | None | None | None | RENT | NaN | 69 | w | 1 | 2 | 1 | 84.92 | 13.56 | Dec-2018 | Feb-2019 | 84.92 | Feb-2019 | 2500 | Current | 2137 | NaN | 140 | 212 | 1 | 1 | 0 | NaN | NaN | 45.0 | 2 | 1 | NaN | 2 | NaN | Mar-2019 | 0 | 2 | 5 | 3 | 3 | 16 | 7 | 18 | 5 | 9 | 0 | 0 | 0 | 3 | 9 | 2 | 2 | 1 | 2 | 2 | 7 | NaN | 2386.02 | 2386.02 | None | 100.0 | 0.0 | True | 1 | 1 | debt_consolidation | n | 0.0 | 4341 | NaN | 10.3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | None | NaN | None | NaN | C1 | 0 | 36 months | Debt consolidation | 0 | 16901 | 60124 | 34 | 16901 | 12560 | 36500 | 11 | 18124 | 167.02 | 167.02 | 53.04 | 0.0 | 113.98 | 42000 | NaN | Not Verified | 109xx | ||
| 1 | 42949672961 | 0 | 10 | LA | 57 | 90000.0 | NaN | Individual | 24763 | 13761 | 8.3 | 0 | 0.0 | 0 | N | None | NaN | 0 | 0 | None | Cash | 26.52 | NaN | Jun-1987 | 10+ years | Postmaster | 30000 | 30000.0 | D | NaN | NaN | None | N | NaN | NaN | None | NaN | None | None | None | None | MORTGAGE | NaN | 88 | w | 2 | 2 | 0 | 777.23 | 18.94 | Dec-2018 | Feb-2019 | 777.23 | Feb-2019 | 30000 | Current | 998 | NaN | 163 | 378 | 4 | 3 | 3 | 71.0 | NaN | 75.0 | 3 | 4 | NaN | 4 | NaN | Mar-2019 | 0 | 2 | 4 | 4 | 9 | 27 | 8 | 14 | 4 | 13 | 0 | 0 | 0 | 6 | 13 | 4 | 4 | 2 | 3 | 4 | 5 | NaN | 29387.75 | 29387.75 | None | 95.0 | 0.0 | True | 1 | 1 | debt_consolidation | n | 0.0 | 12315 | NaN | 24.2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | None | NaN | None | NaN | D2 | 0 | 60 months | Debt consolidation | 1208 | 321915 | 372872 | 44 | 99468 | 87153 | 15000 | 15 | 94072 | 1507.11 | 1507.11 | 894.86 | 0.0 | 612.25 | 50800 | NaN | Source Verified | 713xx | ||
| 2 | 42949672962 | 0 | 4 | MI | 35 | 59280.0 | NaN | Individual | 18383 | 13800 | 0.0 | 0 | 0.0 | 0 | N | None | NaN | 0 | 0 | None | Cash | 10.51 | NaN | Apr-2011 | 6 years | Administrative | 5000 | 5000.0 | D | NaN | NaN | None | N | NaN | NaN | None | NaN | None | None | None | None | MORTGAGE | NaN | 72 | w | 1 | 0 | 0 | 180.69 | 17.97 | Dec-2018 | Feb-2019 | 180.69 | Feb-2019 | 5000 | Current | 0 | NaN | 87 | 92 | 15 | 14 | 2 | NaN | NaN | NaN | 14 | 77 | NaN | 14 | NaN | Mar-2019 | 0 | 0 | 3 | 3 | 3 | 4 | 6 | 7 | 3 | 8 | 0 | 0 | 0 | 0 | 8 | 0 | 1 | 0 | 2 | 0 | 2 | NaN | 4787.21 | 4787.21 | None | 100.0 | 0.0 | True | 0 | 0 | debt_consolidation | n | 0.0 | 4599 | NaN | 19.1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | None | NaN | None | NaN | D1 | 0 | 36 months | Debt consolidation | 0 | 110299 | 136927 | 13 | 11749 | 7150 | 13800 | 5 | 10000 | 353.89 | 353.89 | 141.10 | 0.0 | 212.79 | 24100 | NaN | Source Verified | 490xx |
Start data engineering with Spark
## RUN LEGACY TO_DATE()
sdf1_loan.createOrReplaceTempView('TBP_LOAN_RAW')
# set Legacy timestamp police
spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
sql_tbp_loan_raw_to_silver = """
WITH TBP_SILVER (
SELECT
contract_id,
acc_now_delinq,
addr_state,
annual_inc,
delinq_2yrs,
CASE WHEN dti < 0 THEN 0
WHEN dti IS NULL THEN 0
ELSE dti
END dti,
earliest_cr_line,
TO_DATE(earliest_cr_line, 'MMMM-yyyy') AS earliest_cr_line_DT,
CAST (SUBSTRING(earliest_cr_line, length(earliest_cr_line) - 3, 4) AS INT) earliest_cr_line_year,
emp_length,
CAST ( REPLACE ( REPLACE ( REPLACE (REPLACE( REPLACE ( REPLACE(emp_length, '+ years', ''), 'years', '')
, '< 1 year', '0') , 'year', '') , ' ', ''), 'n/a', '0') AS INT) emp_length_int,
funded_amnt,
funded_amnt_inv,
grade,
home_ownership,
initial_list_status,
inq_last_6mths,
installment,
int_rate,
issue_d,
TO_DATE(issue_d, 'MMMM-yyyy') AS issue_d_DT,
CAST (SUBSTRING(issue_d, length(issue_d) - 3, 4) AS INT) issue_d_year,
loan_amnt,
loan_status,
CASE WHEN loan_status IN ('Charged Off', 'Default', 'Late (31-120 days)',
'Does not meet the credit policy. Status:Charged Off' )
THEN '0'
ELSE '1'
END AS loan_status_good_vs_bad,
CASE
WHEN mths_since_last_delinq IS NOT NULL THEN mths_since_last_delinq
ELSE 0
END mths_since_last_delinq,
CASE
WHEN mths_since_last_record IS NOT NULL THEN mths_since_last_record
ELSE 0
END mths_since_last_record,
purpose,
recoveries,
term,
CAST(REPLACE(term, 'months', '') AS INT) term_int,
verification_status,
zip_code,
-- REPORT ONLY
emp_title,
chargeoff_within_12_mths,
last_pymnt_amnt,
last_pymnt_d,
next_pymnt_d,
title,
total_acc,
-- ML AND SCORECARD
-- contract_id,
total_pymnt,
total_rec_prncp,
ROUND(recoveries / funded_amnt, 3) as recovery_rate,
(funded_amnt - total_rec_prncp) / funded_amnt as credit_conversion_factor_CCF
-- COMPLEMENT COLS
,sub_grade
,open_acc
,pub_rec
,total_acc
,total_rev_hi_lim
FROM TBP_LOAN_RAW
WHERE 1 = 1
-- AND issue_d LIKE '%2014'
)
SELECT CASE
WHEN recovery_rate > 1 THEN 1
WHEN recovery_rate < 0 THEN 0
ELSE recovery_rate
END as recovery_rate_pct
, ROUND( months_between(TO_DATE('2019-03-01', 'yyyy-MM-dd'), issue_d_DT), 1) as mths_since_issue_d
, ROUND(months_between(TO_DATE('2019-03-01', 'yyyy-MM-dd'), earliest_cr_line_DT), 1) as mths_since_earliest_credit_line
, TBP_SILVER.*
FROM TBP_SILVER
WHERE 1 = 1
AND issue_d_year in (2015, 2016, 2017, 2018)
"""
sdf2_silver = spark.sql(sql_tbp_loan_raw_to_silver)
# sdf2_silver.printSchema()
cols_sorted = sorted(set(sdf2_silver.columns))
initial_columns = ['contract_id', 'loan_status']
initial_columns.reverse()
cols_sorted = [col for col in cols_sorted if col not in initial_columns]
for col_idx in initial_columns:
cols_sorted.insert(0, col_idx)
cols_sorted
## Spark dataframe with columns sort
sdf2_silver = sdf2_silver[cols_sorted]
# sdf2_silver.printSchema()
sdf2_silver.groupBy('loan_status').count().show()
+------------------+------+ | loan_status| count| +------------------+------+ | Fully Paid|668930| | Default| 31| | In Grace Period| 8716| | Charged Off|185263| |Late (31-120 days)| 21537| | Current|906193| | Late (16-30 days)| 3653| +------------------+------+
sdf2_hdf = sdf2_silver.where(" loan_status = 'Current' ")
fshape(sdf2_hdf)
fhead(sdf2_hdf)
Shape : 906193 51
| contract_id | loan_status | acc_now_delinq | addr_state | annual_inc | chargeoff_within_12_mths | credit_conversion_factor_CCF | delinq_2yrs | dti | earliest_cr_line | earliest_cr_line_DT | earliest_cr_line_year | emp_length | emp_length_int | emp_title | funded_amnt | funded_amnt_inv | grade | home_ownership | initial_list_status | inq_last_6mths | installment | int_rate | issue_d | issue_d_DT | issue_d_year | last_pymnt_amnt | last_pymnt_d | loan_amnt | loan_status_good_vs_bad | mths_since_earliest_credit_line | mths_since_issue_d | mths_since_last_delinq | mths_since_last_record | next_pymnt_d | open_acc | pub_rec | purpose | recoveries | recovery_rate | recovery_rate_pct | sub_grade | term | term_int | title | total_acc | total_pymnt | total_rec_prncp | total_rev_hi_lim | verification_status | zip_code | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 42949672960 | Current | 0 | NY | 55000.0 | 0 | 0.954408 | 0 | 18.24 | Apr-2001 | 2001-04-01 | 2001 | 10+ years | 10 | Chef | 2500 | 2500.0 | C | RENT | w | 1 | 84.92 | 13.56 | Dec-2018 | 2018-12-01 | 2018 | 84.92 | Feb-2019 | 2500 | 1 | 215.0 | 3.0 | 0 | 45 | Mar-2019 | 9 | 1 | debt_consolidation | 0.0 | 0.0 | 0.0 | C1 | 36 months | 36 | Debt consolidation | 34 | 167.02 | 113.98 | 42000 | Not Verified | 109xx |
| 1 | 42949672961 | Current | 0 | LA | 90000.0 | 0 | 0.979592 | 0 | 26.52 | Jun-1987 | 1987-06-01 | 1987 | 10+ years | 10 | Postmaster | 30000 | 30000.0 | D | MORTGAGE | w | 0 | 777.23 | 18.94 | Dec-2018 | 2018-12-01 | 2018 | 777.23 | Feb-2019 | 30000 | 1 | 381.0 | 3.0 | 71 | 75 | Mar-2019 | 13 | 1 | debt_consolidation | 0.0 | 0.0 | 0.0 | D2 | 60 months | 60 | Debt consolidation | 44 | 1507.11 | 612.25 | 50800 | Source Verified | 713xx |
| 2 | 42949672962 | Current | 0 | MI | 59280.0 | 0 | 0.957442 | 0 | 10.51 | Apr-2011 | 2011-04-01 | 2011 | 6 years | 6 | Administrative | 5000 | 5000.0 | D | MORTGAGE | w | 0 | 180.69 | 17.97 | Dec-2018 | 2018-12-01 | 2018 | 180.69 | Feb-2019 | 5000 | 1 | 95.0 | 3.0 | 0 | 0 | Mar-2019 | 8 | 0 | debt_consolidation | 0.0 | 0.0 | 0.0 | D1 | 36 months | 36 | Debt consolidation | 13 | 353.89 | 212.79 | 24100 | Source Verified | 490xx |
## Used table for prediction
sdf2_hdf.createOrReplaceTempView('TBP_CREDIT_RISK_MODELING')
A wrapper function for classification metrics evaluation is a function that simplifies the process of evaluating the performance of a classification model. It provides a unified interface for calculating various metrics such as accuracy, precision, recall, and F1 score, which are commonly used to measure the effectiveness of a classification algorithm.
## Function to print Confusion Matrix and metrics
def rpt_metrics_report_CM(y_true, y_pred, msg_model=' model name ... ', rpt_confusion_matrix=False):
"""Print metrics """
accuracy_score_rpt = accuracy_score(y_true, y_pred)
recall_score_rpt = recall_score(y_true, y_pred)
auc_rpt = roc_auc_score(y_true, y_pred)
print('Model: ', msg_model)
print('-- Accuracy: ', accuracy_score_rpt)
print('-- AUC : ', auc_rpt)
print('-- Recall : ', recall_score_rpt)
print('')
if rpt_confusion_matrix:
report = classification_report(y_true, y_pred)
confusion_matrix_rpt = confusion_matrix(y_true, y_pred)
print('-- Confusion Matrix')
print('0 FP')
print('FN 1')
print('')
print(confusion_matrix_rpt)
print('')
print('')
print('-- Metrics report')
print(report)
print('')
def print_evaluation_fnc_classification_metrics(dataframe1, label='label', prediction='prediction'):
from pyspark.ml.evaluation import BinaryClassificationEvaluator, MulticlassClassificationEvaluator
from pyspark.sql.functions import expr, col
## obs. as colunas precisam ter o nome label e prediction
df = dataframe1.select(label, prediction)
cols = ['label', 'prediction']
df = df.toDF(*cols)
# cast label column to Double
df = df.withColumn("label", df["label"].cast("Double"))
df = df.withColumn("prediction", df["prediction"].cast("Double"))
# assuming your DataFrame has the following column names: "label" and "prediction"
predictionsAndLabels = df.select("label", "prediction")
# create BinaryClassificationEvaluator object
binary_evaluator = BinaryClassificationEvaluator(labelCol="label", rawPredictionCol="prediction", metricName="areaUnderROC")
# create MulticlassClassificationEvaluator object
multiclass_evaluator = MulticlassClassificationEvaluator(labelCol="label", predictionCol="prediction")
# compute classification metrics for binary classification
areaUnderROC = binary_evaluator.evaluate(predictionsAndLabels)
areaUnderPR = binary_evaluator.setMetricName("areaUnderPR").evaluate(predictionsAndLabels)
# f1Score = binary_evaluator.setMetricName("f1").evaluate(predictionsAndLabels)
# compute classification metrics for multiclass classification
accuracy = multiclass_evaluator.evaluate(predictionsAndLabels, {multiclass_evaluator.metricName: "accuracy"})
precision = multiclass_evaluator.evaluate(predictionsAndLabels, {multiclass_evaluator.metricName: "weightedPrecision"})
recall = multiclass_evaluator.evaluate(predictionsAndLabels, {multiclass_evaluator.metricName: "weightedRecall"})
f1Score = multiclass_evaluator.evaluate(predictionsAndLabels, {multiclass_evaluator.metricName: "f1"})
confusionMatrix = predictionsAndLabels.groupBy("label", "prediction").count().orderBy("label", "prediction").toPandas()
# print classification metrics
print("")
print("Multiclass Classification Metrics:")
print("")
print("Accuracy = %s" % accuracy)
print("Precision = %s" % precision)
print("Recall = %s" % recall)
print("F1 Score = %s" % f1Score)
print("")
print("")
print("Confusion Matrix:")
print(confusionMatrix)
print("")
print("\nBinary Classification Metrics:")
print("")
# print("Area Under ROC = %s" % areaUnderROC)
print("Area Under PR = %s" % areaUnderPR)
print("F1 Score = %s" % f1Score)
# print("Confusion Matrix:")
# print(binary_evaluator.evaluate(predictionsAndLabels, {binary_evaluator.metricName: "confusionMatrix"}))
# print_evaluation_fnc_classification_metrics(sdf_credit_score,label='loan_status_good_vs_bad', prediction='predict')
def print_evaluation_regression_metrics(pred_dataframeSpark_1, label_col_1='label', prediction_col_1='prediction'):
print('--------------- Regression Metrics')
print()
evaluator = RegressionEvaluator(labelCol=label_col_1, predictionCol=prediction_col_1, metricName="r2")
r2 = evaluator.evaluate(pred_dataframeSpark_1)
print("R2 - coeficient of determination on test data = %g" % r2)
print()
# Select (prediction, true label) and compute test error
evaluator = RegressionEvaluator(labelCol=label_col_1, predictionCol=prediction_col_1, metricName="rmse")
rmse = evaluator.evaluate(pred_dataframeSpark_1)
print("Root Mean Squared Error (RMSE) on test data = %g" % rmse)
print()
evaluator = RegressionEvaluator(labelCol=label_col_1, predictionCol=prediction_col_1, metricName="mae")
mae = evaluator.evaluate(pred_dataframeSpark_1)
print("Mean Absolute Error (MAE) on test data = %g" % mae)
print()
# print_evaluation_regression_metrics(predictions_boston_house, 'MEDV', 'ZSCORE0')
h2o_model_dir = '/tmp/Credit_Risk_Modeling/h2o_glm_gbm_model/'
glm_model = h2o.load_model('/mnt/d/'+h2o_model_dir +'fit_glm_2015_2017.model')
def fnc_percent_print(metric):
return round(metric * 100 , 4)
# glm_model.model_performance()
print(' GLM model accuracy - ', round(glm_model.accuracy()[0][1] * 100 , 4), ' % ')
print(' --- Max precision of ' , fnc_percent_print(glm_model.find_threshold_by_max_metric(metric='precision')),
' % with threshold adjustment ')
GLM model accuracy - 78.3499 % --- Max precision of 96.2257 % with threshold adjustment
## gbm model - Predict recovery rate pct for Current Loan
gbm_model = h2o.load_model('/mnt/d/'+h2o_model_dir +'gbm_model_recovery_rate_pct_EL_calculation.model')
gbm_model.model_performance()
ModelMetricsRegression: gbm ** Reported on train data. ** MSE: 0.00855472323108516 RMSE: 0.09249174682686645 MAE: 0.062329840365562966 RMSLE: 0.0782266478276316 Mean Residual Deviance: 0.00855472323108516
# ## Example to export with partition data by Year
# pq_spark_h2o_integration = '/tmp/zdata_s3/credit_risk_modeling/data_s3_credit_risk_modeling_2018.parquet'
# sdf2_hdf.write.format('parquet').mode('overwrite').partitionBy('issue_d_year').save(pq_spark_h2o_integration)
## Export as only one file sample
pq_spark_h2o_integration_workaround = '/tmp/zdata_s3/credit_risk_modeling/data_zs3_credit_risk_modeling_2018_01.parquet'
sdf2_hdf.coalesce(1).write.format('parquet').mode('overwrite').save(pq_spark_h2o_integration_workaround +'.one_file.parquet')
dir_path = pq_spark_h2o_integration_workaround+'.one_file.parquet' + '/'
parquet_files = []
# Get a list of all the Parquet files in the directory
parquet_files = [os.path.join(dir_path, f) for f in os.listdir(dir_path) if f.endswith(".parquet")]
# parquet_files[0]
h2o_file = parquet_files[0]
hdf_sdf2 = h2o.upload_file(h2o_file, destination_frame='hdf_loan_credit_risk_2018.hex')
Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%
hdf_sdf2.head(3)
| contract_id | loan_status | acc_now_delinq | addr_state | annual_inc | chargeoff_within_12_mths | credit_conversion_factor_CCF | delinq_2yrs | dti | earliest_cr_line | earliest_cr_line_DT | earliest_cr_line_year | emp_length | emp_length_int | emp_title | funded_amnt | funded_amnt_inv | grade | home_ownership | initial_list_status | inq_last_6mths | installment | int_rate | issue_d | issue_d_DT | issue_d_year | last_pymnt_amnt | last_pymnt_d | loan_amnt | loan_status_good_vs_bad | mths_since_earliest_credit_line | mths_since_issue_d | mths_since_last_delinq | mths_since_last_record | next_pymnt_d | open_acc | pub_rec | purpose | recoveries | recovery_rate | recovery_rate_pct | sub_grade | term | term_int | title | total_acc | total_pymnt | total_rec_prncp | total_rev_hi_lim | verification_status | zip_code |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4.29497e+10 | Current | 0 | NY | 55000 | 0 | 0.954408 | 0 | 18.24 | 2001-04-01 00:00:00 | 2001-04-01 00:00:00 | 2001 | 10+ years | 10 | Chef | 2500 | 2500 | C | RENT | w | 1 | 84.92 | 13.56 | 2018-12-01 00:00:00 | 2018-12-01 00:00:00 | 2018 | 84.92 | 2019-02-01 00:00:00 | 2500 | 1 | 215 | 3 | 0 | 45 | 2019-03-01 00:00:00 | 9 | 1 | debt_consolidation | 0 | 0 | 0 | C1 | 36 months | 36 | Debt consolidation | 34 | 167.02 | 113.98 | 42000 | Not Verified | 109xx |
| 4.29497e+10 | Current | 0 | LA | 90000 | 0 | 0.979592 | 0 | 26.52 | 1987-06-01 00:00:00 | 1987-06-01 00:00:00 | 1987 | 10+ years | 10 | Postmaster | 30000 | 30000 | D | MORTGAGE | w | 0 | 777.23 | 18.94 | 2018-12-01 00:00:00 | 2018-12-01 00:00:00 | 2018 | 777.23 | 2019-02-01 00:00:00 | 30000 | 1 | 381 | 3 | 71 | 75 | 2019-03-01 00:00:00 | 13 | 1 | debt_consolidation | 0 | 0 | 0 | D2 | 60 months | 60 | Debt consolidation | 44 | 1507.11 | 612.25 | 50800 | Source Verified | 713xx |
| 4.29497e+10 | Current | 0 | MI | 59280 | 0 | 0.957442 | 0 | 10.51 | 2011-04-01 00:00:00 | 2011-04-01 00:00:00 | 2011 | 6 years | 6 | Administrative | 5000 | 5000 | D | MORTGAGE | w | 0 | 180.69 | 17.97 | 2018-12-01 00:00:00 | 2018-12-01 00:00:00 | 2018 | 180.69 | 2019-02-01 00:00:00 | 5000 | 1 | 95 | 3 | 0 | 0 | 2019-03-01 00:00:00 | 8 | 0 | debt_consolidation | 0 | 0 | 0 | D1 | 36 months | 36 | Debt consolidation | 13 | 353.89 | 212.79 | 24100 | Source Verified | 490xx |
[3 rows x 51 columns]
hdf_glm_predict = glm_model.predict(hdf_sdf2)
glm prediction progress: |███████████████████████████████████████████████████████| (done) 100%
hdf_glm_predict.head(3)
| predict | p0 | p1 |
|---|---|---|
| 1 | 0.233671 | 0.766329 |
| 1 | 0.446381 | 0.553619 |
| 1 | 0.208331 | 0.791669 |
[3 rows x 3 columns]
hdf_gbm = gbm_model.predict(hdf_sdf2)
gbm prediction progress: |███████████████████████████████████████████████████████| (done) 100%
hdf_gbm.head(3)
| predict |
|---|
| 0.0709352 |
| 0.079283 |
| 0.0781656 |
[3 rows x 1 column]
# teste
hdf_sdf3_gold = h2o.deep_copy(hdf_sdf2['contract_id'], xid='hdf_loan_credit_risk_2018_EL.hex')
## Recovery rate with GBM prediction
hdf_sdf3_gold['recovery_rate_pct_predict_gbm'] = hdf_gbm['predict']
hdf_sdf3_gold.head(3)
| contract_id | recovery_rate_pct_predict_gbm |
|---|---|
| 4.29497e+10 | 0.0709352 |
| 4.29497e+10 | 0.079283 |
| 4.29497e+10 | 0.0781656 |
[3 rows x 2 columns]
## Concatenate h2o frames with prediction - GLM model prediction
hdf_sdf3_gold = hdf_sdf3_gold.concat(hdf_glm_predict, axis=1)
hdf_sdf3_gold.head(3)
| contract_id | recovery_rate_pct_predict_gbm | predict | p0 | p1 |
|---|---|---|---|---|
| 4.29497e+10 | 0.0709352 | 1 | 0.233671 | 0.766329 |
| 4.29497e+10 | 0.079283 | 1 | 0.446381 | 0.553619 |
| 4.29497e+10 | 0.0781656 | 1 | 0.208331 | 0.791669 |
[3 rows x 5 columns]
## Export file
h2o.export_file(frame=hdf_sdf3_gold, path='/tmp/credit_risk_modeling_h2o_2018.csv.gz', compression='gzip', force=True)
Export File progress: |██████████████████████████████████████████████████████████| (done) 100%
sdf_glm_gbm = spark.read.csv('/tmp/credit_risk_modeling_h2o_2018.csv.gz', inferSchema=True, header=True)
sdf_glm_gbm.printSchema()
root |-- contract_id: long (nullable = true) |-- recovery_rate_pct_predict_gbm: double (nullable = true) |-- predict: integer (nullable = true) |-- p0: double (nullable = true) |-- p1: double (nullable = true)
fhead(sdf_glm_gbm)
| contract_id | recovery_rate_pct_predict_gbm | predict | p0 | p1 | |
|---|---|---|---|---|---|
| 0 | 42949672960 | 0.070935 | 1 | 0.233671 | 0.766329 |
| 1 | 42949672961 | 0.079283 | 1 | 0.446381 | 0.553619 |
| 2 | 42949672962 | 0.078166 | 1 | 0.208331 | 0.791669 |
# fshape(sdf_glm_gbm)
# fshape(sdf2_hdf)
fhead(sdf2_hdf)
| contract_id | loan_status | acc_now_delinq | addr_state | annual_inc | chargeoff_within_12_mths | credit_conversion_factor_CCF | delinq_2yrs | dti | earliest_cr_line | earliest_cr_line_DT | earliest_cr_line_year | emp_length | emp_length_int | emp_title | funded_amnt | funded_amnt_inv | grade | home_ownership | initial_list_status | inq_last_6mths | installment | int_rate | issue_d | issue_d_DT | issue_d_year | last_pymnt_amnt | last_pymnt_d | loan_amnt | loan_status_good_vs_bad | mths_since_earliest_credit_line | mths_since_issue_d | mths_since_last_delinq | mths_since_last_record | next_pymnt_d | open_acc | pub_rec | purpose | recoveries | recovery_rate | recovery_rate_pct | sub_grade | term | term_int | title | total_acc | total_pymnt | total_rec_prncp | total_rev_hi_lim | verification_status | zip_code | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 42949672960 | Current | 0 | NY | 55000.0 | 0 | 0.954408 | 0 | 18.24 | Apr-2001 | 2001-04-01 | 2001 | 10+ years | 10 | Chef | 2500 | 2500.0 | C | RENT | w | 1 | 84.92 | 13.56 | Dec-2018 | 2018-12-01 | 2018 | 84.92 | Feb-2019 | 2500 | 1 | 215.0 | 3.0 | 0 | 45 | Mar-2019 | 9 | 1 | debt_consolidation | 0.0 | 0.0 | 0.0 | C1 | 36 months | 36 | Debt consolidation | 34 | 167.02 | 113.98 | 42000 | Not Verified | 109xx |
| 1 | 42949672961 | Current | 0 | LA | 90000.0 | 0 | 0.979592 | 0 | 26.52 | Jun-1987 | 1987-06-01 | 1987 | 10+ years | 10 | Postmaster | 30000 | 30000.0 | D | MORTGAGE | w | 0 | 777.23 | 18.94 | Dec-2018 | 2018-12-01 | 2018 | 777.23 | Feb-2019 | 30000 | 1 | 381.0 | 3.0 | 71 | 75 | Mar-2019 | 13 | 1 | debt_consolidation | 0.0 | 0.0 | 0.0 | D2 | 60 months | 60 | Debt consolidation | 44 | 1507.11 | 612.25 | 50800 | Source Verified | 713xx |
| 2 | 42949672962 | Current | 0 | MI | 59280.0 | 0 | 0.957442 | 0 | 10.51 | Apr-2011 | 2011-04-01 | 2011 | 6 years | 6 | Administrative | 5000 | 5000.0 | D | MORTGAGE | w | 0 | 180.69 | 17.97 | Dec-2018 | 2018-12-01 | 2018 | 180.69 | Feb-2019 | 5000 | 1 | 95.0 | 3.0 | 0 | 0 | Mar-2019 | 8 | 0 | debt_consolidation | 0.0 | 0.0 | 0.0 | D1 | 36 months | 36 | Debt consolidation | 13 | 353.89 | 212.79 | 24100 | Source Verified | 490xx |
## Create table to join all information
sdf_glm_gbm.createOrReplaceTempView('TB_CREDIT_RISK_PD_GLM_GBM')
sql_pd_glm_gbm = """
SELECT contract_id
, CASE
WHEN recovery_rate_pct_predict_gbm < 0 THEN 0
WHEN recovery_rate_pct_predict_gbm > 1 THEN 1
ELSE recovery_rate_pct_predict_gbm
END as recovery_rate_pct_predict_gbm
, CASE
WHEN predict = 1 THEN 'Full Payment'
WHEN predict = 0 THEN 'Default'
ELSE 'NOT MAPPED'
END loan_prediction_str
, predict as loan_prediction
, p0 as p0_PD
, p1 as p1
FROM TB_CREDIT_RISK_PD_GLM_GBM
WHERE 1 = 1
"""
sdf3_pd_glm_gbm = spark.sql(sql_pd_glm_gbm)
sdf3_pd_glm_gbm.createOrReplaceTempView('TB_CREDIT_RISK_PD_GLM_GBM_V2')
sdf3_pd_glm_gbm.printSchema()
root |-- contract_id: long (nullable = true) |-- recovery_rate_pct_predict_gbm: double (nullable = true) |-- loan_prediction_str: string (nullable = false) |-- loan_prediction: integer (nullable = true) |-- p0_PD: double (nullable = true) |-- p1: double (nullable = true)
# fhead(sdf3_pd_glm_gbm)
fsummary(sdf3_pd_glm_gbm)
| summary | contract_id | recovery_rate_pct_predict_gbm | loan_prediction_str | loan_prediction | p0_PD | p1 | |
|---|---|---|---|---|---|---|---|
| 0 | count | 906193 | 906193 | 906193 | 906193 | 906193 | 906193 |
| 1 | mean | 4.295048005718689E10 | 0.06782696133337138 | None | 0.9839294719778237 | 0.21968642917867556 | 0.7803135708213307 |
| 2 | stddev | 679831.7950687075 | 0.012169656607159591 | None | 0.1257469029455572 | 0.1360281936979018 | 0.13602819369790117 |
| 3 | min | 42949672960 | 0.0 | Default | 0 | 0.0 | 5.687117625267055E-6 |
| 4 | 25% | 42949912885 | 0.06054192915197007 | None | 1 | 0.1096859414454423 | 0.6992008950554914 |
| 5 | 50% | 42950207714 | 0.06833222666462632 | None | 1 | 0.19491164595872357 | 0.805032687793979 |
| 6 | 75% | 42951110500 | 0.07558022842620801 | None | 1 | 0.3007592422127676 | 0.8903088404354708 |
| 7 | max | 42951933627 | 0.1607236757877962 | Full Payment | 1 | 0.9999943128823747 | 1.0 |
sql_credit_modeling = """
WITH TBP_CREDIT_MODELING_V2
(
SELECT TBP_CREDIT_RISK_MODELING.*
, recovery_rate_pct_predict_gbm
, loan_prediction
, loan_prediction_str
, tb_credit_risk_pd_glm_gbm.p0_PD as PD
, TBP_CREDIT_RISK_MODELING.credit_conversion_factor_CCF as LGD
, ( (TBP_CREDIT_RISK_MODELING.funded_amnt - TBP_CREDIT_RISK_MODELING.total_rec_prncp ) *
( 1 - tb_credit_risk_pd_glm_gbm.recovery_rate_pct_predict_gbm)
) as EAD
FROM TBP_CREDIT_RISK_MODELING,
TB_CREDIT_RISK_PD_GLM_GBM_V2 as TB_CREDIT_RISK_PD_GLM_GBM
WHERE 1 = 1
AND TBP_CREDIT_RISK_MODELING.contract_id = tb_credit_risk_pd_glm_gbm.contract_id
)
SELECT TBP_CREDIT_MODELING_V2.*
, (PD * LGD * EAD ) as EL
FROM TBP_CREDIT_MODELING_V2
"""
sdf3_credit_risk_modeling = spark.sql(sql_credit_modeling)
fshape(sdf3_credit_risk_modeling)
Shape : 906193 58
fhead(sdf3_credit_risk_modeling)
| contract_id | loan_status | acc_now_delinq | addr_state | annual_inc | chargeoff_within_12_mths | credit_conversion_factor_CCF | delinq_2yrs | dti | earliest_cr_line | earliest_cr_line_DT | earliest_cr_line_year | emp_length | emp_length_int | emp_title | funded_amnt | funded_amnt_inv | grade | home_ownership | initial_list_status | inq_last_6mths | installment | int_rate | issue_d | issue_d_DT | issue_d_year | last_pymnt_amnt | last_pymnt_d | loan_amnt | loan_status_good_vs_bad | mths_since_earliest_credit_line | mths_since_issue_d | mths_since_last_delinq | mths_since_last_record | next_pymnt_d | open_acc | pub_rec | purpose | recoveries | recovery_rate | recovery_rate_pct | sub_grade | term | term_int | title | total_acc | total_pymnt | total_rec_prncp | total_rev_hi_lim | verification_status | zip_code | recovery_rate_pct_predict_gbm | loan_prediction | loan_prediction_str | PD | LGD | EAD | EL | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 42949672966 | Current | 0 | IL | 51000.0 | 0 | 0.957355 | 0 | 2.40 | Nov-2004 | 2004-11-01 | 2004 | 4 years | 4 | Account Manager | 2000 | 2000.0 | D | RENT | w | 1 | 72.28 | 17.97 | Dec-2018 | 2018-12-01 | 2018 | 72.28 | Feb-2019 | 2000 | 1 | 172.0 | 3.0 | 0 | 0 | Mar-2019 | 1 | 0 | debt_consolidation | 0.0 | 0.0 | 0.0 | D1 | 36 months | 36 | Debt consolidation | 9 | 141.56 | 85.29 | 0 | Source Verified | 606xx | 0.061044 | 1 | Full Payment | 0.263890 | 0.957355 | 1797.828366 | 454.196849 |
| 1 | 42949672976 | Current | 0 | OH | 102500.0 | 0 | 0.954014 | 0 | 15.20 | Dec-2002 | 2002-12-01 | 2002 | 4 years | 4 | Worship Director | 7000 | 7000.0 | B | MORTGAGE | w | 0 | 235.80 | 12.98 | Dec-2018 | 2018-12-01 | 2018 | 235.80 | Feb-2019 | 7000 | 1 | 195.0 | 3.0 | 38 | 0 | Mar-2019 | 9 | 0 | house | 0.0 | 0.0 | 0.0 | B5 | 36 months | 36 | Home buying | 21 | 464.03 | 321.90 | 14700 | Not Verified | 436xx | 0.078220 | 1 | Full Payment | 0.108663 | 0.954014 | 6155.741967 | 638.143929 |
| 2 | 42949673064 | Current | 0 | CA | 65000.0 | 0 | 0.979040 | 0 | 8.12 | Apr-1989 | 1989-04-01 | 1989 | 10+ years | 10 | senior buyer | 30000 | 30000.0 | D | MORTGAGE | w | 2 | 761.32 | 17.97 | Dec-2018 | 2018-12-01 | 2018 | 761.32 | Feb-2019 | 30000 | 1 | 359.0 | 3.0 | 29 | 104 | Mar-2019 | 15 | 1 | debt_consolidation | 0.0 | 0.0 | 0.0 | D1 | 60 months | 60 | Debt consolidation | 19 | 1477.72 | 628.81 | 18800 | Source Verified | 917xx | 0.070298 | 1 | Full Payment | 0.389139 | 0.979040 | 27306.453746 | 10403.294357 |
# ## Evaluate null values
# from pyspark.sql.functions import isnan, when, count, col
# def fnc_count_null_values(dataframe1):
# dataframe1.select([count(when(isnan(c) | col(c).isNull(), c )).alias(c) for c in dataframe1.columns if c not in [
# 'home.dest', 'issue_d_DT', 'earliest_cr_line_DT'
# ]]
# ).show(vertical=True)
# fnc_count_null_values(sdf3_credit_risk_modeling)
## backup - v2
sdf3_credit_risk_modeling.coalesce(1).write.mode('overwrite').format('parquet').save(
'/tmp/zdata_s3/credit_risk_modeling_full_1M_PBI.parquet'
)
# ## export sample with partition key for GitHub - small sample of data
sdf3_credit_risk_modeling.where(' issue_d_year = 2015 ').write.mode('overwrite').format(
'parquet').partitionBy('issue_d_year').save('/tmp/zdata_s3/credit_risk_modeling_github_sample.parquet')
sdf3_credit_risk_modeling.createOrReplaceTempView('TBP_RPT__PBI__CREDIT_RISK')
sql_qry = """
WITH TB_EL_RPT AS
(
SELECT loan_status
, loan_prediction_str
,sum(funded_amnt) as sum_funded_amnt
,sum( round( EL, 4)) as sum_EL
FROM TBP_RPT__PBI__CREDIT_RISK
WHERE 1 = 1
GROUP BY 1, 2
)
SELECT loan_status
, loan_prediction_str
, sum_funded_amnt
, ( sum_EL / sum_funded_amnt ) * 100 EL_pct
FROM TB_EL_RPT
WHERE 1 = 1
AND 1 = 1
ORDER BY 3 DESC
"""
sql_sdf_current = spark.sql(sql_qry)
# sql_sdf_current.printSchema()
fhead(sql_sdf_current)
| loan_status | loan_prediction_str | sum_funded_amnt | EL_pct | |
|---|---|---|---|---|
| 0 | Current | Full Payment | 14044302400 | 11.090311 |
| 1 | Current | Default | 295299225 | 41.932573 |
### JOIN DEFAULT Spark with all Predictions
# TBP_CREDIT_RISK_MODELING AND TB_CREDIT_RISK_PD_GLM_GBM
print(' ---------- data pipeline from raw to silver and bronze table ... ')
spark.sql(' SHOW TABLES ').show(truncate=False)
---------- data pipeline from raw to silver and bronze table ... +---------+----------------------------+-----------+ |namespace|tableName |isTemporary| +---------+----------------------------+-----------+ | |tb_credit_risk_pd_glm_gbm |true | | |tb_credit_risk_pd_glm_gbm_v2|true | | |tbp_credit_risk_modeling |true | | |tbp_loan_raw |true | | |tbp_rpt__pbi__credit_risk |true | +---------+----------------------------+-----------+
sql_rpt = """
WITH TB_EL_RPT AS
(
SELECT loan_status
, loan_prediction_str
,sum(funded_amnt) as sum_funded_amnt
,sum( round( EL, 4)) as sum_EL
FROM TBP_RPT__PBI__CREDIT_RISK
WHERE 1 = 1
GROUP BY 1, 2
)
SELECT loan_status
, loan_prediction_str
, sum_funded_amnt / 1000 / 1000 as sum_funded_amnt_M
, sum_EL / 1000 / 1000 as sum_EL_M
, ( sum_EL / sum_funded_amnt ) * 100 EL_pct
FROM TB_EL_RPT
WHERE 1 = 1
AND 1 = 1
ORDER BY 3 DESC
"""
sdf_rpt = spark.sql(sql_rpt)
fhead(sdf_rpt)
| loan_status | loan_prediction_str | sum_funded_amnt_M | sum_EL_M | EL_pct | |
|---|---|---|---|---|---|
| 0 | Current | Full Payment | 14044.302400 | 1557.556806 | 11.090311 |
| 1 | Current | Default | 295.299225 | 123.826564 | 41.932573 |
## Scatter with BUBLE SIZE
sdf_rpt.pandas_api().plot.scatter(x='sum_funded_amnt_M', y='sum_EL_M',
color='loan_prediction_str', size='EL_pct', title=' Loan amount vs Expected Loss - in Millions')
!jupyter nbconvert --to html Credit_risk_modeling__Expected_Loss__EL__PD_LGD_EAD.ipynb